Differential Performance by Gender in Foreign Language Testing

نویسندگان

  • Jie Lin
  • Fenglan Wu
چکیده

Understanding and accounting for gender performance differences on high stakes examinations has become a particular concern for educational researchers to ensure test fairness for all examinees. In the context of second/foreign language proficiency testing, research (Ryan and Bachman, 1992) suggests that males and females do not react differently at the item level. However, as Nandakumar (1993) suggested, items with small but systematic differential item functioning (DIF) may very often go statistically unnoticed, but when combined, they may be detected at the bundle level. Thus, a study of differential bundle functioning (DBF) becomes necessary in order to more fully understand the influence of gender on test performance, especially when important, although perhaps subtle, secondary dimensions associated with different testlets have been found in the TOEFL (Dunbar, 1982; Hale, Rock, & Jirele, 1989; Mckinley & Way, 1992). In the present study of the English Proficiency Test in China, the computer program SIBTEST was employed for DIF/DBF analyses and DIMTEST for dimensionality testing. The results indicate that although the English Proficiency Test did not demonstrate much gender DIF, the SIBTEST and DIMTEST analyses identified and confirmed the presence of the bundle of listening comprehension obviously favouring females, and the bundles of grammar and vocabulary, and cloze favouring males slightly. Differential Performance by Gender in Foreign Language Testing 3 Introduction Understanding and accounting for possible gender differences has become a particular concern for educational researchers to ensure test fairness for all examinees. In the context of second/foreign language proficiency testing, however, gender differences have only been explored to a limited degree. Ryan and Bachman (1992) studied the differential performance on two well-known international tests, the TOEFL (the Test of English as a Foreign Language) and the FCE (the First Certificate of English). Little evidence was found that males and females reacted differently at the item level to either test. Similar results were also reported when the reading comprehension testlet of the TOEFL was studied (Wainer & Lukhele, 1997). However, as Wainer and Lukhele (1997) suggested, “ it is not sufficient to merely examine each item for DIF [differential item funtioning], but the testlet itself must be examined in its totality” (p. 753). Very often, items with small but systematic DIF may go statistically unnoticed, but when combined, they may be detected at the bundle level (Nandakumar, 1993). Thus, a study of differential bundle functioning (DBF) becomes necessary in order to more fully understand the influence of gender on test performance, especially when important, although perhaps subtle, secondary dimensions associated with different testlets have been found in the TOEFL test (Dunbar, 1982; Hale, Rock, & Jirele, 1989; Mckinley & Way, 1992). The purpose of the present study was to explore whether a DBF analysis would reveal more evidence of differential functioning than a DIF analysis alone in the English Proficienct Test in China. In addition, the presence of secondary dimensions was also investigated as a common explanation for the differential bundle functioning (Shealy & Stout, 1993). Differential Performance by Gender in Foreign Language Testing 4 Literature Review A number of studies conducted in various contexts have confirmed the presence of gender-related differences in verbal ability and language use (Maccoby & Jacklin, 1974; Thorne et al., 1983; Tannen, 1990). The consensus seems to be that females are superior to males in general verbal ability (Maccoby & Jacklin, 1974; Denno, 1982; Cole, 1997), but there is disagreement about which types of verbal ability shows gender differences. This is especially true when it comes to different language skills. Hyde and Linn (1988) conducted a comprehensive meta-analytical study investigating gender differences in verbal ability. Among the 56 vocabulary studies included, six reported a significant difference in favour of males, while eight reported significant differences in favour of females. Generally the meta-analysis demonstrated no significant gender difference in vocabulary, although there was significant heterogeneity in the effect size. In terms of reading comprehension, five out of the 21 studies reported a significant difference in favour of males, while ten found significant differences in favour of females. Generally, females were found to have slight advantages in reading, speaking, writing, and general verbal ability, but the differences were so small that Hyde and Linn argued that gender differences in verbal ability no longer existed. Statistics from ACT of 2001 also showed no significant sex differences in English or reading, although the means of females were slightly higher than those of males (Zwick, 2002). In contrast, a gender study recently conducted by the Educational Testing Service (ETS) yielded completely different results. This comprehensive study (Cole, 1997) involved 400 tests and millions of students. It was reported that a language advantage for females had remained unchanged compared with 30 years ago. As indicated in Figure 1, female superiority in verbal ability ranged from noticeable differences in writing and language use to very small differences Differential Performance by Gender in Foreign Language Testing 5 in reading and vocabulary reasoning. At the same time, however, evidence also suggests that males are superior in listening vocabulary, that is, comprehension of heard vocabulary in both first and second language contexts (Brimer, 1969; Boyle, 1987). In general, despite the female advantage in general verbal ability, there seems to be no agreement as to whether and to what degree gender differences exist in different types of verbal ability. In the context of second language proficiency testing, gender differences have been examined only to a limited degree. Generally, little differential performance by gender has been found. According to Ryan and Bachman (1992), the TOEFL did not demonstrate gender DIF. Of a total of 140 test items, no items were classified as ‘C’(large DIF, as explained later in the paper). Of the six level B (moderate) DIF items, four favoured males and two favoured females. When means of subtests were compared, no significant gender differences were found in listening, structure and written expression, or vocabulary and reading. Wainer and Lukhele (1997) also reported that the reading comprehension testlets of TOEFL showed essentially no differential functioning by gender. Method Instrument and sample The English Proficiency Test (EPT), one of the largest standardized English tests in China, was analyzed in the present study. The EPT is mainly used for assessing the English proficiency of adults who plan to seek further studies abroad at public expense. The subjects were typically university graduates with several years of work experience. Modelled after the TOEFL, the EPT includes Listening Comprehension (30 items), Grammar and Vocabulary (40 items), Cloze (20 items), Reading Comprehension (30 items), and Writing (1 item). In this study, Differential Performance by Gender in Foreign Language Testing 6 all 120 multiple-choice items (the first four subtests) from the 1999 administration were examined. The sample included 3160 males and 1299 females. Procedures Differential item/bundle functioning (DIF/DBF) Differential item functioning (DIF) analysis is a procedure often used to identify items that function differently between different groups, and thus help monitor the validity and fairness of tests. It is based on the assumption that test takers who have similar knowledge (based on total test scores) should perform in similar ways on individual test questions regardless of their sex, race, or ethnicity. Differential Item Functioning (DIF) occurs when an item is substantially harder for one group than for another group after the overall differences in knowledge of the subject tested are taken into account. Once the DIF items are detected statistically, there is a need for substantive interpretation to determine whether the items display bias or impact. If the item is biased, which unfairly favours one group of examinees over another, the item should be deleted or revised. If the item demonstrates impact, which reflects the actual difference in knowledge between the groups on the construct of interest, the item should be retained but further investigation may be necessary to explore why one group scored higher for this item. Differential bundle functioning (DBF), a natural extension of DIF, examines the differential functioning of interpretable bundles of items instead of an individual item. The advantages of DBF lies in its increased power, more effectively controlled Type I error, and its ability to offer insight into DIF amplification (Bolt, 2002). Items with small but systematic DIF may go statistically unnoticed, but when combined, they may be detected at the bundle level (Nandakumar, 1993). A bundle is a suspect subtest that is presumed to measure the primary Differential Performance by Gender in Foreign Language Testing 7 dimension and a common secondary dimension, whereas the matching or valid subtest is believed to measure only the primary dimension. Once a bundle is flagged for DBF, there is also a need for substantive interpretation to determine whether the bundle displays bias or impact. SIBTEST The simultaneous item bias test (SIBTEST) implements a nonparametric statistical method of assessing DIF/DBF in an item or bundle of items based on Shealy-Stout’s (1993) multidimensional model for DIF. The basic assumption is that multidimensionality produces DIF/DBF. SIBTEST detects bias by comparing the responses of examinees in the reference and focal groups that have been allocated to bins using their scores on a "matching subtest" (Stout & Roussos, 1995). The matching subtest is a subset of items that, ideally, are known to be unbiased. Roussos and Stout (1996) proposed the following guidelines for SIBTEST to classify DIF on a single item: (a) negligible or A-level DIF: Null hypothesis is rejected and the absolute value of beta-uni < 0.059; (b) moderate or B-level DIF: Null hypothesis is rejected and 0.059 =< the absolute value of beta-uni < 0.088; and (c) large or C-level DIF: Null hypothesis is rejected and the absolute value of beta-uni >= 0.088. For DBF, however, no guidelines exist for classifying the beta-uni values. A four-step procedure (Gierl et al., 2001) was used to identify dimensions, if any, for which there were gender differences. First, the amount of DIF for each test item was obtained using SIBTEST (Stout & Roussos, 1995), and all items with B/C-level DIF were identified. Second, items were grouped by the four multiple-choice subtests of the EPT, and the beta-uni values for the items within each group were graphed. Third, interpretable bundles were identified by visually examining the graph and looking for groups of items that consistently favoured females or males. Fourth, the identified bundles were tested using the remaining items as the Differential Performance by Gender in Foreign Language Testing 8 matching subtest after deleting items that displayed the most DIF, C-level DIF. DIMTEST To confirm the presence of secondary dimensions as identified in the SIBTEST analyses, DIMTEST analyses were conducted. A common explanation for the occurrence of DIF/DBF is the measurement of a nuisance dimension(s) unrelated to the primary dimension that is intended to be measured (Shealy & Stout, 1993; Roussos & Stout, 1996). While SIBTEST estimates the amount of DIF/DBF beta-uni index, DIMTEST provides more direct evidence about a common source of DIF/DBF: multidimensionality. The DIMTEST statistic T and corresponding p-values are provided in the output. In this study, the DIMTEST analyses contained the same bundles as the studied and matching subtests in the SIBTEST analysis. Results Psychometric characteristics The psychometric characteristics on the English Proficiency Test for males and females are summarized in Tables 1 and 2. Based on the total mean scores, there was no significant difference between the male and female examinees, although males did slightly better than females. This is an advantage for the present study in that the more similar the groups, the more accurate the DIF detection (Hambleton et al., 1993). The mean differences between males and females in each of the four sections were also tested using t-tests. The results indicated that females did significantly better than males in listening comprehension, while males outperformed females in both cloze, and grammar and vocabulary. When combined together, it is not surprising that there was no overall difference between male and female examinees on the English

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gender Gap in Aptitude Test of Iranian EFL learners

The present study was carried out with the aim of finding gender gap in a recent aptitude test, the Cognitive Ability for Novelty in Acquisition of Language (CANAL-FT) as applied to foreign language test designed by Grigorenko et al. (2002).  For the purposes of this research, 126 undergraduate students (95 females and 31 males) all majoring in English at Shiraz Azad University participated in ...

متن کامل

Inductive vs. Deductive Grammar Instruction and the Grammatical Performance of EFL Learners

Learning a foreign language offers a great challenge to students since it involves learning different skills and subskills. Quite a few number of researches have been done so far on the relationship between gender and learning a foreign language. On the other hand, two major approaches in teaching grammar have been offered by language experts, inductive and deductive. The present study examines...

متن کامل

The Effect of Using Interest-based Materials on EFL Learners' Performance in Reading: Focusing on Gender Differences

Interest plays a key role in education and language learning. This study investigated if selecting and using interest-based instructional materials could impact learners' performance in L2 reading. It also examined whether there were meaningful differences between male and female learners' performance, concerning the use of interest-based materials. Sixty first-grade university students partici...

متن کامل

Metacognitive Strategy Awareness and Listening Anxiety: The role of gender and proficiency level among Iranian EFL learners

While listening plays an important role in the process of foreign/second language learning, different factors can affect this process. This study was designed to assess the relationship between listening anxiety and metacognitive strategy awareness with a special interest in the role of gender and proficiency level. In order to conduct this survey, two instruments including the Metacognitive Aw...

متن کامل

Metacognitive Strategy Awareness and Listening Anxiety: The role of gender and proficiency level among Iranian EFL learners

While listening plays an important role in the process of foreign/second language learning, different factors can affect this process. This study was designed to assess the relationship between listening anxiety and metacognitive strategy awareness with a special interest in the role of gender and proficiency level. In order to conduct this survey, two instruments including the Metacognitive Aw...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004